FAQ Mining Via List Detection

نویسندگان

  • Yu-Sheng Lai
  • Kuao-Ann Fung
  • Chung-Hsien Wu
چکیده

This paper presents an approach to FAQ mining via a list detection algorithm. List detection is very important for data collection since list has been widely used for representing data and information on the Web. By analyzing the rendering of FAQs on the Web, we found a fact that all FAQs are always fully/partially represented in a list-like form. There are two ways to author a list on the Web. One is to use some specific tags, e.g. tag for HTML. The lists authored in this way can be easily detected by parsing those special tags. Another way uses other tags instead of the special tags. Unfortunately, many lists are authored in the second way. To detect lists, therefore, we present an algorithm, which is independent of Web languages. By combining the algorithm with some domain knowledge, we detect and collect FAQs from the Web. The mining task achieved a performance of 72.54% recall and 80.16% precision rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Semantic Inference for Agriculture FAQ Retrieval

FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture dom...

متن کامل

Web-Based Communication Strategies Designed to Improve Intention to Minimize Risk for Colorectal Cancer: Randomized Controlled Trial

BACKGROUND People seek information on the Web for managing their colorectal cancer (CRC) risk but retrieve much personally irrelevant material. Targeting information pertinent to this cohort via a frequently asked question (FAQ) format could improve outcomes. OBJECTIVE We identified and prioritized colorectal cancer information for men and women aged 35 to 74 years (study 1) and built a websi...

متن کامل

The Viewpoints FAQ

The structure of this brief paper follows an emerging convention the FAQ Frequently Asked Questions list. FAQs have grown out of Internet newgroups where participants, tired of seeing the same questions repeated by newcomers, provide a list of canned answers to the most frequently asked questions. An FAQ also provides a covert role in defusing tiresome or unduly acrimonious debates by summarisi...

متن کامل

Template-Based Information Mining from HTML Documents

Tools for mining information from data can create added value for the Iqternet. As the majority of electronic documents available over the network are in unstructured textual form, extracting useful information from a document usually involves information retrieval techniques or manual processing. This paper presents a novel approach to mining information from HTML documents using tree-structur...

متن کامل

FAQ: A Framework for Fast Approximate Query Processing on Temporal Data

Temporal queries on time evolving data are at the heart of a broad range of business and network intelligence applications ranging from consumer behavior analysis, trend analysis, temporal pattern mining, sentiment analysis on social media, cyber security, and network monitoring. In this work, we present an innovative data structure called Fast Approximate Query-able(FAQ), which provides a unif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002